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Abstract 

It is shown that in a sequence of randomly generated bipartite configurations with number of left nodes 
approaching infinity, the probability that a particular configuration in the sequence has a minimum bisection width 
proportional to the number of vertices in the configuration approaches 1 so long as a sufficient condition on 
the node degree distribution is satisfied. This graph theory result implies an almost sure Q (ro 2 ) scaling rule for 
the energy of capacity-approaching LDPC decoder circuits that directly instantiate their Tanner Graphs and are 
generated according to a uniform configuration model, where n is the block length of the code. For a sequence of 
circuits that have a full set of check nodes but do not necessarily directly instantiate a Tanner graph, this implies 
an O (n 15 ) scaling rule. In another theorem, it is shown that all (as opposed to almost all) capacity-approaching 

LDPC decoding circuits that directly implement their Tanner graphs must have energy that scales as f 1 (^n (log n/ 2 j. 
These results further imply scaling rules for the energy of LDPC decoder circuits as a function of gap to capacity. 


I. Introduction 

Low density parity check codes are a class of codes first introduced by Gallager in [TJ- This paper 
finds fundamental lower bounds on the energy of VLSI implementations of capacity-approaching LDPC 
decoders. Central to the construction and analysis of LDPC codes is the randomly generated Tanner graph 
with a given degree distribution. A widely used method of analysis involves analyzing an ensemble of 
LDPC codes whose Tanner graphs are generated according to some distribution. It has been shown that 
there exist degree distributions that result in LDPC codes and decoders that can get arbitrarily close to 
capacity for an erasure channel [2J. The first main result of this paper is an ”almost-sure” scaling rule 
for the energy of capacity-approaching LDPC decoders whose Tanner graphs are generated according to 
a uniform configuration model. The second main result of this paper is a scaling rule for the energy of 
all, as opposed to almost all, capacity-approaching LDPC decoders. What we mean by an ’’almost sure” 
and ’’sure” scaling rule will be made more precise later in the paper. 

To find energy-complexity lower bounds on a class of algorithms a computation model is needed. We 
use a standard circuit model that was first presented by Thompson in [[3:j. In this model, we consider 
the energy of a circuit implementation of an algorithm to be the area of the circuit multiplied by the 
number of clock cycles required to execute the algorithm. We will give a more detailed discussion of this 
model later in the paper. The authors of [4J used the Thompson model to analyze the energy complexity 
of all decoding algorithms by showing that as the target block error probability approaches 0, the total 
energy must approach infinity. In [[5j] the authors showed that any fully-parallel decoding scheme that 
asymptotically has block error probability less than | must have energy complexity which scales as 
Q (riy/\ogn). These results, though general, do not suggest the existence of any decoder implementations 
that reach these lower bounds. In this paper, we in particular show that the energy of LDPC decoding 
schemes that directly-implement their Tanner graphs cannot reach the Q (n-y/logn) energy lower bound, 
and in fact must have energy that scales at least as Q (n (log n) 2 ). 
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We begin the paper in Section [TT] with a discussion of the graph theory used in the paper, and we 
also discuss some prior work that reaches similar conclusions to our paper. Then, in Section [111] we 
introduce graph theory definitions and the circuit model that we will use. We also present some important 
lemmas that will be used in our theorems. Then, in Section IV after defining some properties of node 
degree distributions, we present the main theorem which shows that almost all LDPC Tanner graphs have 
minimum bisection width proportional to the number of vertices. We proceed to show how this theorem 
allows us to find scaling laws for the energy of directly-implemented LDPC decoders in Section [Vj The 
results presented in these sections are true for almost all LDPC decoders (i.e., for a set of decoders with 
probability approaching one), but it is not clear whether there is a set of LDPC decoders of probability 

we present a theorem that relates the 


approaching 0 that can approach capacity. Thus, in Section VI 


number of edges and vertices in a graph to the area of its circuit instantiation to show a scaling rule 
that is applicable to any LDPC decoding algorithm that approaches capacity. This results in a sure as 
opposed to almost sure scaling law for the energy per iteration of a directly-instantiated LDPC decoder 
of O (n (logn) 2 ). 


II. Background 

A. Related Work on LDPC Scaling Rules 

There are some results on fundamental limits on wiring complexity of LDPC decoders. In particular, 
in 0 . the authors assume that the average wire length in a VLSI instantiation of a Tanner graph is 
proportional to longest wire in an asymptotic sense, and that the longest wire is proportional to the 
diagonal of the circuit upon which the LDPC decoder is laid out. The implication of these assumptions 
is an Q (ri 2 ) scaling rule for the area of directly-implemented LDPC circuits, which is the same result of 
this paper. However, these assumption are taken as axioms without being fully justified; there certainly 
can exist bipartite Tanner graphs that can be instantiated in a circuit without such area. The result of 
this paper suggests that, in fact, the Q (n 2 ) scaling rule is justified for almost all VLSI instantiations of 
LDPC Tanner graphs as the block length of these LDPC codes grow large, where the Tanner graphs are 
generated from a uniform configuration model and a sufficient condition on the node degree distributions 
is satisfied. This scaling rule is an implication of the main theoretical contribution of this paper: a result in 
random graph theory that we present as Theorem |T] In addition to this, we provide a super-linear energy 
scaling rule for all directly-implemented LDPC decoders, even if the Tanner graph of such decoders is 
not generated according to the uniform configuration model. 

B. Related work on Graph Theory 

In graph theory, there are a number of results that study the minimum-bisection width of graphs. Often 
this work looks at a graph’s Laplacian, which is a matrix equal to the difference in the graph’s degree 
matrix and adjacency matrix. In (7] a graph’s Laplacian is analyzed and it is shown that the second largest 
eigenvalue, A 2 , can be used to find a lower bound of pp on the graph’s minimum bisection width. In 
[ 8 ], the authors find some bounds on the bisection width of graphs that are related to this A 9 value. The 
authors in [0 provide almost sure upper bounds for the bisection width of randomly generated regular 
graphs. Our result does not consider the second greatest eigenvalue of the Laplacian of a graph to bound 
the minimum bisection width. Instead, we use a unique purely combinatorial approach to reach our almost 
sure lower bounds. Furthermore, our analysis is of random bipartite graphs, as opposed to random regular 
graphs. As well, our result makes only weak assumptions on the node degree distribution to get our lower 
bound, without requiring a degree-regularity assumption. The generality of the result allows us to apply 
the theorem to find a scaling rule for the area of almost all capacity-approaching directly-implemented 
LDPC decoding circuits. 
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Fig. 1. Example of two graphs with a minimum bisection labelled. Nodes are represented by circles and edges by lines joining the circles. 
A dotted line crosses the edges of each graph that form a minimum bisection. 


III. Definitions and Main Lemmas 

A. Graph Theory Definitions 

The main result of our paper involves the minimum bisection width of a graph. The minimum bisection 
width is a property of any graph. A bisection is a set of edges that once removed divides the graph into 
two subgraphs that have the same number of vertices. A formal definition is given below. 

Definition 1 . Consider a graph G with vertices V and edges E. Let E s C E be a subset of the edges. 
Then E s bisects G if removal of E s cuts V into unconnected sets V\ and V 2 in which ||Vi| — \V 2 \\ < 1. 
A minimal bisection is a bisection of a graph whose size is minimal over all bisections. The minimum 
bisection width is the size of a minimal bisection. 


Generally speaking, finding the minimum bisection width of a graph is a difficult problem (it is in fact 
NP-Complete [ 101). The diagram in Fig. [I] shows minimal bisections of a few simple graphs. Associated 
with a bisection E s of a graph G are two unconnected graphs G\ = (Vj, E\) and G 2 = (V 2 , E 2 ) induced 
by the bisection. We will refer to the set of vertices Vj and V 2 each as a bisected set of vertices induced 
by a bisection or, more compactly, a bisected set of vertices , where the association with the particular 
bisection is to be implicit. 

Note that in this paper we will often consider dividing the vertices of a subset into two disjoint sets 
V\ and V 2 in which | Vj | — | If | < 1. For convenience of discussion, we call this process dividing the 
vertices in half We make particular note of this to avoid in every case having to distinguish between if 
the cardinality of the set of vertices in question is even or odd. 


B. Circuit Model 

Central to our discussion is the relation between minimum bisection width of a graph and the area (and 
thus energy) of a circuit that implements that graph. Our discussion applies directly to LDPC decoders, 
and within our model we must define an LDPC decoder, as well as a more general circuit. In this paper, 
the definition of a circuit is adapted from Thompson |3j and is considered to be a mathematical object 
consistent with the following circuit axioms. This model was also used in |4] to find bounds on the energy 
complexity of encoding and decoding algorithms. We also provide a diagram of an example circuit in 
Figure [2} 

• A circuit is a collection of nodes and wires laid out on a planar grid of squares. Each grid square 
can be empty, can contain a computational node (sometimes referred to more simply as a node), a 
wire, or a wire crossing. A circuit also has some special nodes called input nodes and also output 
nodes. The purpose of a circuit is to compute a function / : (0, l) n —>• (0, l) k . Such a circuit is said 
to have n inputs and k outputs. The computation is divided into r clock cycles. The inputs into a 
computation are to be loaded into the input nodes, and the outputs are to appear in the output nodes 
during some set clock cycle of the computation. 

• Each grid square has width A w , known as the wire width and thus has area A;,. It is in this parameter 
that this circuit model subsumes different VLSI implementation techniques. In real circuits, this 
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Fig. 2. Diagram of a possible VLSI circuit. Grid squares that are fully filled in represent computation nodes and the lines between them 
represent wires. 


parameter may be a value like 14 nanometers. Our concern in this paper is not what this value is, 
but rather in providing scaling rules in terms of the VLSI implementation technology used. 

. The computational nodes are the “computing” parts of the circuit. A node has at most 4 bidirectional 
wires connected to it, which are used to feed in bits into the node and feed out the bits computed 
by the node. Each node is capable of computing a fixed function of the bits fed into it by the wires 
connected to them during each clock cycle. In particular, a node with / < 4 wires leading into it 
can compute any function g : {0,1}^ —> {0,1}^. However, a computational node is restricted to only 
be able to compute the same function at each clock cycle. We note, of course, that the output of a 
particular node could change with each clock cycle because, in general, the inputs into the function 
could change with each clock cycle. 

• The wires are the “communication” part of a circuit. Wires in a circuit are connections between 
computational nodes, and are assumed in our model to be bidirectional. At each clock cycle a wire 
can carry one bit in each direction. The bits communicated are an output of the function computed 
by the computational node to which the wire is connected. A wire can be placed in a grid square in 
a way that connects one edge of the grid square to some other edge. Thus, grid squares containing 
wires can be connected to form a wire leading from one node to another node. 

. An input node is a special node in the circuit. In addition to being able to compute any fixed function 
mapping its / < 4 inputs to its / < 4 outputs, this node is also given an input bit into the circuit. In 
general, at each clock cycle an input node can have as its input a new input into the function. Thus, we 
say that inputs, in general, can be serialized ; that is, they can be injected into the circuit at different 
clock cycles of the computation. Usually it is assumed that the inputs into an input node are chosen 
from the set {0,1}; however, sometimes (especially for the purpose of lower bound) we can assume 
that the inputs into an input node are chosen from a larger set of values. In [|5j] it was assumed that 
an input node that is attached to / wires can compute any function g : (0, I; ?} x (0, l} 4 —>• (0, l} 4 , 
(i.e., the node can perform any function of its 4-bit input from the wires connecting to it, as well 
as its input, taken from the symbols (0,1, ?}, where in the case of this assumption ? is considered 
an erasure symbol. In our analysis we can assume that an erasure is a valid input as well, however 
this is not a central assumption of this paper and the results apply to inputs being taken from the set 
{ 0 , 1 }. 

• An output node is another special node in a circuit. It is permitted to, like any other node, compute 
any function of its inputs, but it is given an additional output. Thus, in the case of an output node 
with / < 4 wires leading from it, the output node can perform any function g : {0,1}^ —> (0,1}^ +1 
where one of the bits in the output is distinguished as an output bit. The output node is required 
to hold in its output bit some circuit output during set clock cycles. In a fully parallel computation 
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the output node is required to hold one output bit of the computation at the end of the computation, 
but in general the outputs may be serialized, and one output node can be responsible for outputting 
a number of the outputs of the computation, where each output has a specified clock cycle during 
which it is to appear. 

• A wire crossing in a circuit is a grid square that contains two wires that “cross” each other. An 
example of a circuit with computational nodes, wires, and a wire crossing is given in Fig. |2j 

. The normalized area of a circuit is the number of grid squares occupied, and it is denoted with 
the symbol A. The number of grid squares occupied with nodes/wires is the normalized area of die 
nodes/wires of the circuit, and is denoted AjA v . Thus, the actual area of the circuit is A = A ^A 
and the area of the nodes/wires are defined similarly by multiplying the normalized value by A^,, the 
area of a unit grid square. 

• The energy of a computation is proportional to the product of the area of the circuit, times the number 
of clock cycles. Real VLSI circuits are made of conducting material laid out essentially flat; thus, 
in our model, we say that the capacitance of a circuit is proportional to its area. A circuit works 
by, at every clock cycle, charging or discharging its wires. It is thus assumed that the energy of a 
computation is proportional to \CV dd T where C = C unit _ arca /1. Thus, we can denote the energy of a 
computation as E comp = AtcchAv where £ t ech = ^unit-area Am' s a constant that varies depending on 
the technology used to implement the circuit. For decoder circuits we often denote the energy of 
computation as E dec where the subscript indicates the type of computation performed by the circuit 
under consideration. 

Note that the restriction that each node has at most four inputs and four outputs is somewhat arbitrary; 
it is also arbitrary that each node is permitted to compute any function of its inputs all at the same 
cost. In real VLSI implementations it may be that an arrangement of transistors can compute some 
functions more efficiently than others. However, our model does not consider what gains could be made if 
certain functions are cheaper in an energy sense to compute. On the other hand, the model subsumes the 
interconnection complexity of the inputs of the function to their outputs. In the field of error control codes, 
this interconnection complexity has been shown to be a significant factor in the energy of a computation 
in, for example, ED, E3- 

C. Relationship Between Circuit Model and Graphs 

This paper analytically characterizes a relationship between the energy of LDPC decoders as a function 
of block length and gap to capacity. To understand this we must first define what is meant by an LDPC 
decoder implemented according to the Thompson VLSI model. To understand this we must first understand 
the connection between a circuit and the graph corresponding to a circuit. 

Note that a circuit is a collection of nodes connected by wires. Each of the computational nodes of a 
circuit can be thought of as the vertices of a graph, G = ( V., E). The wires of a circuit correspond to the 
edges of a graph. In particular, two vertices v\ and v 2 are connected in the graph G by an edge if and only 
if there is a wire connecting the two computational nodes that correspond to v± and v 2 . Thus, any circuit 
can be considered a graph. As well, any graph can be implemented as a circuit (although of course there 
may be many ways to implement a particular graph on a circuit). Note that although a circuit, according 
to our model, must be planar, since we also allow wire crossings, any graph can be implemented, though 
it may be that more complex graphs require far more circuit area. 

Note that saying that a circuit has a corresponding graph is a slight abuse of terminology: a graph, 
according to common definitions, does not allow for two edges between the same nodes, but obviously 
two computational nodes are permitted to have two or more wires connecting them. More precisely, we 
mean that a circuit has a corresponding multi-graph. However, for the sake of simplicity we simply call a 
circuit’s corresponding multigraph a graph, and we hope that this does not cause confusion for the reader. 

Sometimes in our discussion we may want to refer not to a particular node of the circuit (corresponding 
to the node of a graph), but rather to the nodes associated with a subcircuit, which leads to the following 
definition. 
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A subcircuit is a circuit corresponding to a subset of nodes of the graph and the wires connecting 
them. In particular, it is the circuit induced by deleting all wires not connecting the nodes of interest 
and by deleting all the other nodes in the graph. Any subcircuit has associated with it both internal 
wires (the wires connecting the nodes of this circuit) and also external wires, the wires leading from 
nodes within the subcircuit to nodes from outside the subcircuit. Note that the notion of a subcircuit 
corresponds to a particular subgraph of the graph of the circuit. In the language of graph theory [13), 
we can say that a subcircuit with computational nodes corresponding to some subset of V' C V 
corresponds to the subgraph induced by the vertices in V'. Note that any subset of the computational 
nodes of a graph induces a subcircuit and also a subgraph of the circuit’s graph. 


D. LDPC Decoders 

An LDPC code is a linear code first invented by Gallager in [[Tj . All linear codes can be specified by a 
parity check matrix. Central to the construction LDPC codes is the Tanner graph of the code corresponding 
to a parity check matrix of the code. A Tanner graph is a bipartite graph. Thus, such a graph has two 
partite sets, or sets of unconnected vertices which are referred to as the check nodes and the variable 
nodes. An (n, k ) LDPC code has associated with it a Tanner graph with n variable nodes and at least 
n — k check nodes (we say at least because it may be that some of the linear constraints induced by 
the check nodes are not linearly independent). The n variable nodes correspond to the n symbols of a 
block length n codeword in the LDPC code. A codeword c € {0, l} n is in the LDPC code generated by 
a Tanner graph if, for each check node in the Tanner graph of the code, the mod 2 sum of the values 
of the variable nodes to which they are connected is 0. The association of a set of linear constraints with 
a Tanner graph leads to natural and very efficient methods of decoding that exploit the sparse nature of 
the Tanner graph. 

An LDPC decoding algorithm associated with a Tanner graph is a message-passing procedure. Each 
variable node is thought conceptually to be connected to their check nodes, and each check node corre¬ 
spondingly to their variable nodes. In general, a variable node has as its inputs a message passed to it 
from each of the check nodes to which it is connected, as well as the output of a noisy channel. A variable 
node, in general, is able to compute any function of these inputs and pass the outputs of this computation 
to its adjacent check nodes. The check nodes are similarly allowed to compute any function of their inputs 
(which will be in general the outputs of the variable nodes to which they are connected). An iteration 
of an LDPC decoder is one instance of this procedure: the variable nodes computing a function that is 
then passed to the check nodes, and then the check nodes computing a function of these messages and 
passing the output of these functions back to the variable nodes to which they are connected. A good 
LDPC decoding algorithm should choose these functions well, so that, at the end of a certain number of 
clock cycles r, the variable nodes hold within them an estimate of the original input into a noisy channel. 
In the most general case, we allow the check and variable nodes to compute different functions of their 
inputs during different iterations (i.e., the function they compute in general may vary in time). Gallager 
discussed a variety of these message passing procedures in [|T]. 

To instantiate an LDPC decoding algorithm in a circuit, we consider two possible paradigms, a directly- 
implemented technique in which the Tanner graph of an LDPC code is directly instantiated in some sense 
by the circuit, and a complete-check node serialized technique, in which the Tanner graph is not necessarily 
directly implemented, but there are subcircuits in the graph corresponding to each check node and an LDPC 
message passing procedure is performed. 

A directly-instantiated LDPC decoder can be thought of as a circuit that has a graph that is an 
implementation of a Tanner graph of the underlying LDPC code. To be precise, we will use terminology 
borrowed from graph theory regarding the subdivision of a graph. 

Definition 2. Suppose a graph has an edge, e, connecting vertices v\ and ty. Then a subdivision of edge 
e in a graph is a process that takes the graph G and forms a new graph G' with an additional vertex v' 
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and two additional edges connecting v\ and v 2 to v' by replacing e with two edges. A subdivision of a 
graph G is a graph obtained by the successive subdivisions of edges in the graph. 

If a graph G has a subgraph that is a subdivision of a graph 6". then we say that the graph G contains 
graph G'. This leads to an important lemma that will allow us to connect bounds on graph properties of 
a Tanner graph to the area of directly-implemented LDPC decoders. 

Definition 3. A directly-implemented LDPC decoder is a circuit associated with an LDPC code with a 
Tanner graph T. Consider the graph associated with the circuit. Then a circuit is a directly-implemented 
LDPC decoder if its graph contains T. 

This means that a circuit is a directly-implemented LDPC decoder if there are subcircuits corresponding 
to each variable node and edges leading from these “black boxes” that connect to subcircuits that 
correspond to the check nodes of the Tanner graph. 

Associated with any graph G is a quantity that we will call the minimum area of a circuit implementation 
of G, or, to be more concise, the area of G. The area of a graph G is the circuit with corresponding graph 
G with the minimum number of grid squares occupied. We denote this quantity as A min (G). 

Lemma 1. If a graph G contains a graph G', then A min (G) > A min (G 1 ). 

Remark 1. This is a very intuitive idea. If a graph contains another graph, then naturally one would regard 
the original graph as “larger” in some sense then the graph that it contains. This notion will be used to 
connect a bound on the area of a circuit implementing the Tanner graph of an LDPC code to a bound on 
directly-implemented LDPC decoders. 

Proof: Suppose that A min (G) < A min (G r ). Consider the circuit with minimal area that implements 
G. We can use this circuit to construct a circuit for G' with area less than A min ((?'), resulting in a 
contradiction. Since G contains G', there is a subgraph of G that is a subdivision of 6". Consider the 
subcircuit associated with that subgraph. Clearly, this subcircuit has area less than or equal to A min (G). 
Delete those nodes of this subgraph that correspond to subdivisions of edges of G'. On a circuit, this 
corresponds to replacing a computational node with merely a wire. This process does not change the area 
of this subgraph, and it will result in a circuit for G' less than A min (■ G'), a contradiction. ■ 

There is a key result attributed to Thompson [[3] that relates a graph’s minimum bisection width to the 
area of a circuit implementing that graph, presented in the following lemma. 

Lemma 2. If a graph has minimum bisection width oj, then the area of a circuit implementing this graph 
is lower bounded by 



Proof: See Thompson |3} for a detailed proof. ■ 

Currently, our definition of a directly-implemented LDPC decoder subsumes many practical implemen¬ 
tations of LDPC decoding algorithms, but in practice circuits can be implemented that perform an LDPC 
decoding algorithm and do not directly instantiate the Tanner graph of the code. This thus motivates the 
following definition of a more general type of LDPC decoder. 

Definition 4. An (n, k) complete-check-node LDPC decoder associated with Tanner graph T is a circuit 
with n separate subcircuits each corresponding to a variable node in T and one subcircuit corresponding to 
each check node in T. During one iteration a message must be passed from each variable-node subcircuit 
to each adjacent check-node subcircuit, and also from each check-node subcircuit to each adjacent variable- 
node subcircuit. To be precise, the check-node subcircuits that are adjacent to a variable-node subcircuit 
are those check-node subcircuits that correspond to check nodes in T that are adjacent to the variable 
node that corresponds to the variable-node subcircuit of interest. The variable-node subcircuits that are 
adjacent to a check-node subcircuit are defined similarly. 



Note that for such a circuit we do not require that a wire exists in the circuit for each edge in the Tanner 
graph. Thus, it is possible that a complete-check-node LDPC decoder can use the same wire multiple 
times, but in different clock cycles to communicate information during an iteration. 

Our results rely on the evaluation of some limits, which we present as lemmas below. 

Lemma 3. Suppose P (n) — O (n k ) for some k > 0 and is positive for sufficiently large n, and there is 
a sequence ni,n 2 ,. . . that increases without bound. Then: 


lim P (nf) exp (—n t f (n)) = 0 if 

i—> 00 

lim / (n) > 0. 

n—> 00 

Proof: Since Hindoo / (n) > 0 and the sequence n t increases without bound, then for sufficiently 
large i, f (nf) > c for some c > 0 (in particular for any c strictly less than the value of the limit). Then, 
for sufficiently large i, 

P (nf) exp (-nj (nf)) < P (nf) exp (-cnf). 

Clearly, lim^^ P (nf) exp (—cnf) = 0 and because P ( n ) is positive for large enough n, P (nf) exp (—n*/ (nf)) 
0 for large enough i. The limit thus follows from the squeeze theorem. ■ 

Lemma 4. For any two positive integers m and n in which 

m + n < Y (1) 

for an integer Y > 0 where Y < Z and both m < Z and n < Z, 

m\n\ < Z\ (Y - Z)\. (2) 

Proof: Since m + n < Y, then n = Y — m surely maximizes the product m\n\ (regardless of any 
additional restriction on n). Suppose a possible choice of m = Z — c and m = Y — Z + c, for some c > 0 
in which Y — Z + c < Z. We divide Z\ (Y — Z)\ by the quantity (Z — c)\ (Y — Z + c)\ and show that 
this quantity is greater than or equal to 1, meaning that Z\ (Y — Z)\ maximizes the product: 

Z\ (Y-Z)l 

(Z - c)\(Y - Z+ c)\ ~ 

Z(Z -1)...(Z-c+l) 

(Y — Z + c) (Y' — Z + c — 1)... (Y — Z + 1) 

Note that the numerator and denominator have precisely c terms. Since Z > Y — Z + c the terms in 
the numerator are strictly greater than a corresponding term in the denominator, unless Y — Z + c — Z, 
but of course in this case the product is merely equal to the upper bound in ([2]). ■ 

IV. Main Theorem 

Our main theorem is fundamentally graph-theoretic in nature and applies to graphs generated according 
to a standard uniform random configuration model. We present this theorem in a general form and then 
specialize it to create an “almost sure” scaling rule for capacity-approaching LDPC codes. 

Consider the set of bipartite graphs G = (V L II V R , E ) in which \V L \ = n, \V R \ = m, and with left node 
degree sequence A = (Ai, A 2 ,..., A n ) G (N) n and right node degree sequence P = (pi, p 2 ,..., p m ) G 
(N) m . In other words, for a particular graph in this set, A* is the degree of v t G Vl, the /'th left node in 
the graph, and pi is the degree of r, G V R , the /'th right node in the graph. Without loss of generality, 
assume that the degree sequences are ordered, i.e. that Ai < A 2 < ... < X n and p\ < p 2 < ... p m , and 
also, without loss of generality, assume n > m. Denote this set Q (A, P). Note that the number of edges 
in each particular graph in Q (A, P) is \E\ = Yn=i K = Y%L 1 Pi- 
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For convenience of counting, we will consider not the set of graphs with a particular degree sequence, 
but rather the set of configurations with this degree sequence. We can associate each node in a graph 
with a number of sockets equal to its degree. Then, we can label each socket, so that, for example, the 
first node in the left side of the bipartite graph would have sockets labelled L u , L 12 ,... L 1Al , where the 
symbol Lij is used to denote the /th socket on the /th left node. Thus, the /th left node would have 
A i sockets labelled L t] . L a . ■ ■ ■ L,\ t . Also, the right nodes would have sockets labelled R, t , where R l3 
denotes the jth socket on the zth right node. This node and socket configuration model is a standard way 
to consider the set of bipartite graphs that form the Tanner graphs of LDPC ensembles, and in particular 
is discussed in length in [14j. A multigraph together with a labelling of the sockets of each node is called 
a configuration. Any particular left and right degree sequences A and P have associated with them the 
set of all configurations with these node degree sequences, and this set is called the configuration space 
associated with the degree sequences. Clearly, a configuration is determined by a permutation mapping the 
\E\ left node sockets to the \E\ right node sockets. Note that there are \E\ \ configurations within the space 
of configurations with degree sequences A and P. Let the set of configurations with degree sequences A 
and P be denoted B (A, P ). Since a configuration is merely a graph with a labelling of sockets for each 
node, graph properties can be extended to describe configurations in the natural way, including minimum 
bisection width. 

Define 

B a = {G E B (X, p) : 3 a bisection K e E such that \K\ = a} 


or in other words let B a be the set of configurations in B (A, p) that have a bisection of size a. Note that 
B a does not represent the set of configurations in B (A, P) with minimum bisection width a, but rather 
the set of graphs with any bisection of size a. Define B* to be the set of all configurations in B (A, P) 
that have a bisection of size a or less, or in particular 


Define 


K = U 

i=0 


M A) 



(3) 


(a function of a particular left degree sequence) and let 


(A) = — - S L . 
n 


We define these quantities so that any 
from these nodes. Similarly, define 


subset of half the left nodes can have at most Sn “sockets” leading 


Sr{P) 



Pi 


and 

ctr (P) = — — $r 

m 

The quantities 5l (A) and a L (A) are functions of the left degree distribution. As well, Sr (P) and a R (P) 
are functions of the right degree distribution. For convenience, we may sometimes denote these quantities 
as S L , <Jl,Sr and a R , and their dependence on the degree distributions is to be implicit. Thus, it is clear 
that the total number of edges in such a configuration is Spn + cr L n = S R m + a R m. Define 

s (A m = max (mi (A), mS' (P)) (4) 

n 
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and define 

a (A,P) = — - 5. 
n 

For notational convenience we will abbreviate these two quantities as S and o and their dependence on 
the node degree distribution under discussion is to be implicit. Note that \E\ = 8n + an. These quantities 
are defined so that in any subset of half the nodes (— 7 p-) of a configuration in B (A. P), the minimum of 
the number of left sockets and right sockets cannot exceed 8n. This observation will be useful in deriving 
the bounds in this and will be made more formal in Lemma [5] 

Consider a given set of nodes N C V for a bipartite multigraph as defined above, with left degree 
sequences A and right degree sequences P. For a given subset of vertices N we can thus divide this set 
into two disjoint sets, N L and N R , where N L is the set of all those vertices in N that are left nodes, and 
N r all those vertices in N that are right nodes. Let R (. N) = Y^ V £N r deg (' v ) an d L ( N) = J2 v gn l deg ( v ) 
be the number of “sockets” attached to the left nodes in N and right nodes in N respectively. 

Lemma 5. For any bipartite multigraph G = (Vj j II V R , E ) with left degree sequences A and right degree 
sequences P, for any collection N of ''-Gfi 1 vertices, min ( L (. N ), R (N)) < nS. 

Remark 2. We will use this lemma in a counting upper-bounding argument. Specifically, we will count 
the number of graph configurations that have a bisection of size a by dividing the vertices that form a 
graph into two equally-sized sets. The quantity min (L (N), R(N)) will be important for our counting 
bounds. 

Proof: Suppose not. This implies that both L ( N) > nS and R ( N ) > nS. Divide the vertices in 
N into the left nodes N L and right nodes N R . It must be that \N L \ + \N R \ = Thus, it must be 
that \Nl\ < | or \N R \ < f (otherwise their sum would exceed Let us consider the case in which 

\N l \ < | (the other case leads to an analogous argument). If \N L \<J and L(N) > n5, then, in particular 
L ( N) > nSi (A) by the definition of nS. But S R (A) by definition [3] is the sum of the highest degree left 
nodes. A collection of at most half these nodes cannot exceed this quantity, leading to a contradiction. ■ 

Lemma 6. If a configuration G = {Vj J II V R , E) with degree sequences P and A is generated according 
to the uniform configuration model, then the probability that this configuration is in the set B* and hence 
has a bisection of size a or less, when 

0 < a < an (5) 


is upper bounded by 


P(B a .) < 


{a+l)n 2 (Z)\ lE a l ) 4 a\ (8n)\ (on 
(Sn + on)\ 



( 6 ) 


Proof: Follows from a straightforward counting upper-bounding technique given in the appendix. ■ 
This lemma can be used to prove our main theorem which shows that if a sequence of node-and- 
socket configurations is generated uniformly over all such configurations, and the quantities 8 and o 
(quantities that could in general change with each element of the sequence) scale according to a particular 
condition, then the probability that a configuration in this randomly generated sequence has a small 
bisection (proportional to n or less) approaches 0. 

Our main theorem concerns sequences of random configurations. Specifically, we concern ourselves 
with a sequence of random configurations Gi,G 2 , ■ ■ ■ where each G t in the sequence is a configuration 
generated according to the uniform configuration model, in which the Ah configuration is drawn according 
to node degree distributions A* and P,. Note that the randomness for each element of such a sequence 
does not come from the degree distributions: we are assuming that these distributions are fixed. It is the 
interconnections between nodes that is random. We specifically concern ourselves with a sequence in 
which the number of left nodes n increases without bound. For such a sequence, denote the number of 
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left nodes of the ith configuration as n t . We will abbreviate the quantities S (A,, P,) and a (A t . Pf) with 
the symbols Si and eg respectively, where we recall their definitions in (|4]) and ( [TV] ). When the dependence 
on i is clear, the subscript for these symbols may be omitted for convenience. 


Theorem 1. Suppose that there is a sequence of randomly generated bipartite configurations with a series 
of degree sequences in which in which the number of left nodes approaches infinity, and if 


lim 2H - ) + Si ( In 

i—¥ oo \ Z 


Si 


Oi In 


Si + CTj 

tTi. 


Si + Oi 


< 0 


then there exists some ft > 0 in which 


lim P (B*p ) -+ 0 

t—ioo v ^ ' 

and in particular, this occurs for any value of0</3<cr that satisfies: 


lim 2H (-) + AH 

i —^oo \ 2 ) 

+Si [ In 


P 


SiP cri 

Si 


Si + cr i 


+ ft l^ln 
+ cn I In 


P 


(Ti -ft 
(Ti ~ 3 

Si + (Ti 


< 0. 


(7) 


( 8 ) 


Remark 3. This theorem says that subject to some condition on the average edge degrees of the configu¬ 
rations, as these configurations get larger the probability that the configuration generated has a bisection 
proportional to n or less gets vanishingly small. We will use this result to show that for capacity- 
approaching LDPC degree distributions, the minimum bisection width must be large in some sense, 
implying that circuit implementations of these LDPC Tanner graphs must grow quickly as well, with high 
probability. The condition in ([7]) recognizes that for a sequence of such graphs, the quantities S and cr 
could change with increasing n. If the condition is satisfied (which we will see for capacity-approaching 
LDPC degree sequences it must) then with high probability the graphs do not have a “small” bisection. 

Proof: (of Theorem |T]) Consider first a specific random configuration in the sequence with block 
length n and node degree distributions that result in values for S and cr. We will use the bounds of 
Lemma [6] and then apply well known approximations. Firstly, we use the well known bounds that 


e 





< n\ < e 




71+1 


and that 



< exp 




where H (x) = —x log x — (1 — x) log (1 — x). We use base e as opposed to base 2 in order to conveniently 
simplify the expressions that follow. Applying these bounds appropriately to the bound in Lemma [6| and 
grouping terms that grow slower than n into an arbitrary polynomial term P in) we get the following: 


P (B :) < P (n) (a + 1) exp (2 nH (A + inH (p) 


exp ( aln 


CL T 1 


exp ( (on — a) In 


+ Sn In 

on — a + 1 


Sn + 1 


exp ( — (Sn + on) In 


e 

Sn + oci 
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Expanding the last two terms in the exponent gives us: 


P (B * a ) < P in) (a + 1) exp ( 2 nH Q) + 4 nH 


exp ( aln 


a +1 


+ Sn In 


Sn + 1 


exp ( (an) In 


exp ( — (Sn) In 


an — a + 1 


Sn + an 


a In 


an — a + 1 


(an) In 


Sn + an 


Factoring the terms in the exponent with an a term, a Sn term, and a an term gives us: 


P(K)<P (") (« + 1) exp (2 nH (+) + 4 nH ip) 


exp I a In 


a + 1 


— In 


an — a + 1 


exp ^(cm) ^ln 
exp I (Sn) I In 


an — a + 1 


e 

Sn + 1 


- In 


Sn + an 


— In 


Sn + an 


Simplifying the logarithmic expressions in each line gives us: 


P (B*) < P (n) (a + 1) exp ^ 
exp (a (\n ^ ~*~ 


2 nH(-) +4 nH( — 


an — a + 1 
an — a + 1 


exp (an) In , 

' v ’ ' ' Sn + an 

(', x \ ( Sn +l 

exp (Sn) m -- 

V V \fin + an 

We now let a = (3n, which will satisfy the condition specified in ([5]) for (3 < a. Making this substitution 
and also using that | E\ — Sn + an to expand the \E\ term in the first line of the expression gives us: 


P{B"J < p (n) (Pn + l) 


(inll ( 


exp 
exp I [3n 


H 

exp ^(cm) ^ln 
exp I (Sn) ( In 


-1+4 nH 


/3n + 1 


[Sn 


Sn + an 


an — (Sri + 1 
an — (3n + 1 
Sn + an 
Sn + 1 


Sn + an 
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Simplifying each quotient within the logarithms, and grouping the (fin + 1) term into our arbitrary 
polynomial term: 


P (Bp n ) < P (n) exp ^2 nH + 4 nH 

\ \°-p+k 


p 


S + CT 


exp I fin 


exp 


an) 


exp ( (5n 


In 

In 


a 


P + 1 

^ n. 


5 + a 

5 + b 


5 + a 


By factoring the n term and by applying Lemma [3] we see that the above expression will approach 0 if 


lim 2H - + AH 


P 


5 + a 


+ 


P (in 




(<J 


In 


a — (3 + - 
5 + a 


+ (^) ^ 


P+ 1 

~ n 

a — p + - 

^ n 

1 

' ' < 0 


5 + i 

n 


S + a 


where we recall again that the dependence on i in this expression comes from the n terms and the 5 and 
a terms (whose dependence on i we have suppressed for notational compactness). This is true if 


lim 2 H I - 

i— »oo V 2 


+a 


+ 4 H 

In 


P 


5 + a 
a — P 
5 + a 


+ P |^ln 

+ 5 fin 


P 


a — P 

5 

5 + a 


< 0. 


Also note that this is the condition on P given in ([8]). To derive the condition in {7]), we find the limit as 
P approaches 0 of this expression, and treating the other terms as constants, giving us: 

a 


2H I - ) + cr ( In 


+5 In 


5 + a 
6 

5 + a 


< 0 


where we have applied the easily verifiable facts that linn^o H (^) =0 and Hindoo x (>■'(*)) =0 to 
get rid of the second and third terms in the expression. Thus, if this condition is satisfied, by the definition 
of a limit, there exists a sufficiently small P in which P (-Bg n ) =0. ■ 

As we are considering a sequence of configurations, we let uii be the minimum bisection width of the 
ith configuration. This Theorem has an obvious corollary. 


Corollary 1. If there is a sequence of configurations as described in Theorem [7] in which the condition 
in ([7]) is satisfied then Hindoo P (cu* > Prip = 1. 

Proof: Note that the event B* is the event that a random configuration has a bisection of size a or 
less. The complement of this event is the event that a random configuration has no bisection of size a or 
less, and thus equal to the event that a random configuration has minimum bisection width greater than 
or equal to a. The corollary flows directly from this observation. ■ 
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A. Application to a Specific Sequence of Random Configurations 

Our result in Theorem [I] can be directly applied to the Tanner graphs of specific sequences of LDPC 
codes. For example, consider a regular LDPC ensemble with variable node degree 6 and check node degree 
3. A randomly generated Tanner graph with this degree distribution would have 8n = fff n n = 6| = 3 n 
and on = \E\ —3 n = 3 n. In this case we can compute that the condition in ([7]) evaluates to: 


2 H 
2 H 



+ 5 |^1 n 
+ 3 | In 


8 + o 
3 

3 + 3 


+ o 
+ 3 



-2.77 


which we see is less than 1. Thus, applying our theorem means that since the condition {7]) is satisfied, 
if random Tanner graphs are generated with this degree distribution, with probability approaching 1 the 
minimum bisection width of these graphs will be proportional to n. 


V. Almost Sure Bounds on Capacity Approaching LDPC Circuits 


We will use the result above to find an “almost sure” scaling rule for the energy of a capacity- 
approaching directly-implemented decoding scheme in which the Tanner graph of each decoder is gener¬ 
ated according to a uniform configuration model with a set node degree distribution. 

Consider a decoding scheme C\, C' 2 ,... in which each of the decoders in the scheme are directly- 
implemented LDPC decoders, as in Definition [3} We associate a scheme with a channel that the decoders 
are to decode. Let the capacity of that channel be C. Let the fth decoder have associated block length 
Hi. Let the rate associated with the /1h decoder be R,. Let the gap to capacity associated with the /1h 
decoder be /^ = fe. Let the area of the /th decoder be /l,, and the energy of the /th decoder be E t . Let the 
minimum bisection width of the Tanner graph of the fth decoder be ay. We consider a family of LDPC 
decoding schemes in which the Tanner graph of each decoder in the scheme is generated according to 
a uniform configuration model. Thus, we say that the Tanner graph of decoder i is generated uniformly 
from a family £>,; (A, P ) of configurations. We can thus discuss the probability of the /th decoder having 
certain properties. In particular, in the corollary below, we will analyze P (o/j > jdnf), the probability that 
the ith decoder has a Tanner graph with minimum bisection width greater than fn,,, and show that this 
approaches 1, resulting in an almost sure energy scaling rule for capacity-approaching LDPC decoders. 
We let the event that the ith decoder has a bisection of size a or less to be B* a 


Corollary 2 . For a family of capacity-approaching directly-implemented LDPC decoding schemes where 
the Tanner graph of each decoder is generated according to a uniform configuration model, lim^^ P (A* > erf) 


1 for some constant c > 0. Similarly, lirn, 


P A > 


A-Vi ) 4 


= 1 for a constant c' > 0. 


Proof: Note that a Tanner graph is in fact a bipartite graph as described in the Theorem |T] in which 
the block length corresponds to n and the number of checks corresponds to m. For a sequence of LDPC 
codes to approach capacity, the result in [ 15] implies that 


\E\ 


n (1 — R) 


> Q ( In 


1 — q 


Thus, as capacity is approached, the number of edges per node must approach infinity, and thus the 
quantity 8 must approach infinity. We can thus show that the expression: 


2 H 


l I +6 


In 


8 + o 


+ o In 


o 


8 + o 


< 0 


(9) 
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must be satisfied for sufficient closeness to capacity. 

To see this, note that <5 approaches oo for a capacity-approaching code. What happens to a is either 
(a) lini^oc < 1 or (b) lini,,^^ JA- = 1, or (c) this limit does not exist. Note that this value cannot 


exceed 1 because necessarily a < 5. 

In the case of (c), it must be that the value of a alternates and no limit can be defined. In this case, 
however, we should consider the specific subsequence of decoders in which either (a) or (b) applies. It 
will be clear that since for each subsequence the appropriate scaling rule holds, thus it must be true for 
the entire sequence. 

In case (a): In the limit, In (^) < 0 and so £(ln(^)) —* —oo, as 5 approaches oo. Since 
a (in te)) < 0 in any case (a consequence of a < 5), thus in the limit the inequality (|9p will be 
satisfied. 

For case (b), in which In (^) — > —oo, note that o is positive, so a (in (^)) — > — oo , and thus in 
the limit ([9]) will also be satisfied. 

Note that each Tanner graph in the sequence under consideration is generated according to the uniform 
configuration model. Since the sequence is capacity approaching, by the argument above the node degree 
distributions satisfy the sufficient condition of Theorem [TJ Thus, by applying Corollary [I] 


lim P (Wi > f3rii ) = 1. 


( 10 ) 


We combine this result with Thompson’s |3j result presented in Lemma [2] that the area of a VLSI 




instantiation of a graph with minimum bisection width cu is lower bounded by A c > 4 

that uii > /3rii implies that Ai > Aw( - and thus, 


-. Thus, the event 


lim P 

i—yoo 



i 


as expressed in the theorem statement. 

This result can be used to understand how the area of almost all circuits that instantiate random Tanner 
graphs of LDPC codes must scale as capacity is approached. It is well known from that, as a 

function of fraction of capacity rj = the minimum block length required for any code scales as: 


n ~ 


b 

(i - v ) 2 


for a constant b that depends on the channel statistics and also the target probabilities of error. We are 
not concerned with the value of this constant but rather the dependence of this expression on /?. 

We use this to note that, if cc, > (3n t . then, recognizing from Definition [3] that a directly-instantiated 
LDPC decoder must contain its Tanner graph, and also applying Lemma [I] which says that a circuit must 
be bigger than the minimum area of a circuit instantiation of a graph that the circuit contains, then 


A c > 


A 2 w /3 2 n 2 


4 


> A IP ff 
4 (l ->)) 4 



Combining this observation with the result in ([TO]) results in 


lim P 

i —^oo 



iiV)- 


(1 - V ) 4 


1 


for a constant d > 0, finishing the proof. 
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Applicability of this Result 

There is a minor detail that needs to be dealt with for this theorem to be truly useful. Our results 
assume that a Tanner graph is directly implemented in wires. This is indeed a practical way to create a 
decoding circuit. However, according to our configuration model, it is possible that two or more edges 
can be drawn between the same two nodes. This type of conflict is usually dealt with by deleting even 
multi-edges and replacing odd multi-edges with a single edge (see definition 3.15, the Standard LDPC 
Ensemble in [14]). This leads to a potential problem with the applicability of our theorem: what happens 
if the edges that we delete form a minimum bisection of the induced graph? In that case it is possible 
that the graph we instantiate on the circuit has a lower minimum bisection width than that which we 
calculated, and thus could possibly have less area. However, this is resolved by the fact that in the limit 
as n approaches infinity for a standard LDPC ensemble, the graph is locally tree-like (Theorem 3.49 in 
Jl4| ) with probability approaching 1. This implies that the probability that the number of multi-edges 
in a randomly generated configuration is some fraction of n must approach 0 (or else the graph would 
not be locally tree-like, contradicting the theorem). Hence, even if we did delete these multi-edges from 
the randomly generated configuration, this could at most decrease the minimum bisection width by the 
number of deletions, but this number of deletions, with probability 1, cannot grow linearly with n. Hence, 
the minimum bisection width must still, with probability 1 , grow linearly with n, and our scaling rules 
are still applicable. 


A. Energy Complexity of Capacity Approaching Complete-Check-Node LDPC Decoders 

Below we will consider a sequence of capacity-approaching, complete-check-node serialized decoders. 
Recall that these decoders do not directly instantiate their Tanner graph in wires, but they do have 
subcircuits corresponding to each check and variable node. In each iteration, possibly over several clock 
cycles, messages are to be passed from each variable node subcircuit to their corresponding check node 
subcircuit and similarly for the check node subcircuits passing messages to their corresponding variable 
node subcircuits. It may be that the same wire is used to transmit different messages during different 
clock cycles of the same iteration of the computation. It is thus possible that such a method can decrease 
wiring area (by not requiring a wire for each edge of the Tanner graph) at the cost of more clock cycles. 
We prove below that such a method still results in a super-linear almost sure lower bound on energy 
complexity. So there is no ambiguity, a sequence of decoders for a channel with capacity C with rates 
Ri,R 2 , ■ ■ ■ is capacity-approaching if lini, >0O R, = C. 

Corollary 3. For a sequence of capacity-approaching, complete-check-node serialized LDPC decoders 
whose Tanner graphs are generated according to the uniform configuration model, lini, >oc P (E t > cnj' 5 ) = 
1 for some c > 0. Also, lim;^ P ^E; > ^ 3 ^ 3 ) = 1 and lim^^ P ^ = 1- 

Proof: In considering a complete-check-node serialized LDPC decoder, we note that such a decoder 
contains a graph with n variable nodes and at least n — k check nodes. We will use arguments similar 
to those used by Thompson [3j and Grover [4j|. Let the minimum bisection width of the Tanner graph of 
the associated with the ith decoder be uj t . Suppose that the graph of the circuit implementing this decoder 
has minimum bisection width Wi (we use the symbol W % to distinguish this from the minimum bisection 
width of the Tanner graph u>i of the zth decoder, recalling that we do not require in this case that the 
circuit contains the underlying Tanner graph). Thus, in one iteration, the number of bits communicated 
between any bisection of the nodes must at least be cu*. One iteration must be performed, but since the 
minimum bisection width of the graph associated with this circuit is W % , this requires that more clock 
cycles are used to pass the information between the check and variable nodes, and in particular 


TiWi > Ui. 


(ID 


[“I ^2 

We also know from Lemma u\ that A, > w 4 ’ and so combining with the inequality in (11) gives us: 
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Ar/> 


\ 2 W 2 t 2 A 2 cc 2 

'w VV l ' l \ 'w^z 


> 


4 - 4 ■ (12) 

Trivially, because there are n,; variable node subcircuits in the circuit, A* > n,; and thus combining with 
(fT2l) we get 

1 -' \2,,2, n 

^ 2 r 2 > U i U i 

11 4 

and thus, taking the square root of both sides of this inequality, 

j 11 7 


AiTi > 


Since energy is proportional to the product of circuit area and number of clock cycles, this implies that 
for each decoder in the sequence 

E\ A ^tech 


for the constant £ tec h that depends on the specific technology used to implement the circuits. 

Using the same arguments as Corollary [2] we can show that for a capacity-approaching LDPC scheme 
lim^oo P (c Oi > drii) = 1 for some 3 > 0. Following the logic above, the event that 04 > 3n t implies 
Ei > ^tech 3 n j' 5 which thus implies lirn^oo P (> cnf 5 ) = 1 for some constant c > 0, Also, following 


the same logic as in Corollary 


lim^oo P (Ei> 


(1 -vY 


= 1 and lim». 




* > 


A i = i- 


B. Limitations of Result 

A goal of this research is to find fundamental bounds on the “energy complexity” of capacity-approaching 
decoders as a function of 77 = The result presented here does not quite do this, but it does advise 
engineering by suggesting that if n is very large, one can be reasonably sure that the area of a circuit that 
instantiates a randomly generated Tanner graph will have area that scales as O (n 2 ). Of course, we have 
assumed that this Tanner graph has been generated by going to each socket of the left nodes and randomly 
finding a connection to a remaining right socket. This is of course a very natural way to generate Tanner 
graph, and is in fact used in the analysis of LDPC codes [ T4| . 

This is not to say, of course, that there don’t exist good LDPC coding schemes with slower scaling laws. 
Creating a sequence of LDPC codes that avoids this scaling law with probability greater than 0 would be 
possible if the random generation rule for the LDPC graph was somehow altered. For example, perhaps 
the variable nodes and check nodes could be placed uniformly scattered through a grid and then the 
randomly placed edges, instead of being chosen uniformly over all possible edges, are chosen uniformly 
over a choice of edges connecting variable and check nodes that are “close” to each other. 

In practice, a Tanner graph is often modified to prevent interconnections that are ’’too far” between 
check and variable nodes that result in long wire length and thus higher energy [18]. Simulation in 
a particular case can analyze whether this technique is worth the possible code performance trade-off. 
Currently, however, the common technique of generating an LDPC ensemble and analyzing average code 
performance does not consider energy complexity as a fundamental parameter to be traded-off with other 
code parameters. It seems likely that if “neighbors” of a variable node are restricted to those check nodes 
that are spatially close by, an LDPC code could still have good asymptotic performance if block lengths 
grow large. An analysis challenge of such a scheme may be to show that asymptotically a Tanner graph 
generated from such a distribution is locally tree-like. Furthermore, analysis of the required block length 
using such a technique to get good performance would be needed: even if asymptotically such schemes 
perform well, it may be that much longer block lengths are required for the same performance. The cost 
of possibly larger block length for such a scheme would have to be considered to determine whether it is 
worth it to have a slower scaling rule as a function of block length if it comes at a cost of much longer 
block length. 
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Whether or not such a sequence of LDPC codes would give good performance is unclear. However, 
in the following section we can use known bounds on the average node degree of an LDPC decoder as 
well as bounds on the area of graphs instantiated on a circuit to get scaling rules that are true for all 
directly-implemented capacity-approaching LDPC decoders, not just almost all. 


VI. Bounds for All LDPC Decoder Circuits 

We can find bounds for the energy complexity for all capacity-approaching directly-implemented LDPC 
codes (and not just almost all) by using the following Theorem: 


Theorem 2. If a circuit contains a graph G = (V,E) that has no loops, according to the standard VLSI 
model, the total area of a circuit that contains that graph is bounded as: 

gP/2~lf_\El 

4 V| 

where we recall that A w is the wire width in the circuit, and \E\ and \V\ are the number of edges and 
vertices in the graph, respectively. 


The proof of this theorem uses a similar approach as used by Grover et al. in [4|, in which the A c t 
complexity of circuits is related to the bits communicated within the circuit. The result of this paper, 
however, is a bound on the area of a circuit instantiation of a graph as a function of the number of edges 
and vertices in the graph. We use a similar nested bisection technique as the Grover et al. paper. The 
proof is given in the appendix. 

This result, combined with the results in [15j] on the average edge degree as a function of gap to 
capacity, results in the following corollary: 


Corollary 4. The energy of any directly-instantiated LDPC decoder must have asymptotic energy that is 
lower bounded by: 

Edec A 11 

and average energy per bit decoded that scales as 



^ > n (ni n 2 




where N is the number of iterations required to decode. 


Remark 4. Note that the number of iterations N in the above Corollary in general may be a function of 
the particular decoding algorithm instantiated and possibly the particular received vector. Our discussion 
does not analyze the number of iterations required, so we simply write our scaling rules in terms of this 
quantity. 


Proof: We can combine Sason’s [15] result that the average parity node degree of the Tanner graph 
of a capacity-approaching LDPC code must scale as Cl ^ln (tza) ) an< ^ ^at the minimum block length of 

any code must scale as 0 ( 773 ^ 2 ) [16], [17], meaning that \E\ > 0 


the number of nodes in this graph must be at least |Vj =2 n — k = O 


\n 


k ) ln W7 


Note also that 


n). Combining these results along 


with Theorem [2] results in the scaling laws in the corollary. 

We note that this lower bound on directly-implemented Tanner graphs contrasts with the lower bounds 


in [5], which show an Cl 


ln -L- 

1 1—77 


lower bound for the per bit energy complexity of fully-parallel 


decoding algorithms as a function of gap to capacity. This result means that directly-instantiated LDPC 
decoders are necessarily asymptotically worse than this lower bound (albeit a lot closer than the Cl 


(1 -vY 
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Lower Bound Scaling Rule Per Bit 

Almost all directly instantiated LDPC decoders 

»L ,n ’(A)) 

Almost all LDPC decoders 

"(4*) 

All LDPC with Tanner Graph Directly Implemented 

°( N ln2 fe)) 

All Fully-Parallel Decoders 


"(iMAf)) 


TABLE I 

Summary of the scaling rule lower bounds derived in this paper. We present these bounds as a function of i? = 

In the first three scaling rules presented, N is the number of iterations required (which in general may be a 
function of the actual LDPC code instantiated, as well as the particular received vector). For comparison, we 
also include a result on lower bounds for all fully-parallel decoders given IN 0. 


almost sure lower bound of Corollary [2]). Of course, it is not known whether the lower bounds of the 
paper in [5] are tight, but Corollary [4] proves that directly instantiated LDPC decoders cannot reach these 
lower bounds in an asymptotic sense. 


VII. Conclusion 

The main contribution of this paper is graph theoretic in nature. We have shown that subject to a mild 
condition on node degree distributions, almost all Tanner graph instantiations have a minimum bisection 
width that scales as (it) where n is the number of left nodes. The minimum bisection width of a graph is 
related to the area of circuit implementations of these graphs. We have used this result to show that almost 
all LDPC decoders that directly instantiate their Tanner graph must have circuit area, and thus energy, that 
scales as Q (it 2 ). We can use this result to provide a scaling rule for the energy complexity of almost all 
capacity-approaching LDPC decoders. We have further presented a general theorem on the area of circuits 
that instantiate any graph to further bound the area of any LDPC decoder that approaches capacity. These 
results are summarized in Table [TJ Note that our results show that directly-instantiated LDPC codes cannot 
reach the lower bounds presented in [5j], thus indicated that either the lower bound cited is not tight, or 
directly-instantiated LDPC codes asymptotically not optimal from this energy perspective. It may also be 
that both are true, namely that known lower bounds are not tight and LDPC codes are not asymptotically 
optimal. This remains an open question. 


Appendix A 
Proof Of Lemma[6] 

Proof: (of Lemma [6]) Let the set of graphs in B (A, P ) having a bisection of size a be denoted by 
B a . Then we can say that, according to the uniform configuration model, the probability of the event of 
generating a configuration with a bisection of size a is given by: 


P (Ba) 



i.e., it is the cardinality of the set of such configurations divided by the total number of configurations in 
with node degrees A and P. 

We will now bound the number of configurations in B (A, P ) with a bisection of size a, and we will 
assume that a < on. To do so, we will define a quadrant configuration, show that the number of quadrant 
configurations with a bisection of size a is greater than or equal to B a , and then upper bound the number 
of quadrant configurations with a bisection of size a or less. 

A quadrant configuration of a bipartite configuration G = ( Vl U Vr, E) is an ordered-tuple 0 = 
(G, T l , Tr, B l , Br) where the vertices are divided into 4 disjoint sets, the top left vertices (T L ), the 
top right vertices (Tr), the bottom left vertices (Bl), and the bottom right vertices (Br), in which 
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Fig. 3. An example of a quadrant configuration associated with a degree distribution where all the left nodes have degree 2 and all the 
right nodes have degree 4, in which the number of left nodes n = 8 and number of right nodes m = 4. The fully drawn configuration on 
the right is a quadrant configuration in Q 4 ’ 2 . Recall that the superscript denotes that there are i = 4 top left nodes and j = 2 edges leading 
from top left nodes to bottom right nodes. The subscript indicates that there are a = 4 edges between top and bottom nodes, and in this 
case we see that they cross a dotted line, indicating where the bisection occurs. The diagram on the left shows the drawing of a = 4 edges 
crossing between top and bottom nodes. The graph on the right shows a permutation of the remaining sockets in both the top and bottom 
nodes. 


T L ,B L C V l , T r ,B r C V r and \T n U T L \ — \B L U B R \\ < 1. Naturally, vertices in T L are considered 
top left vertices , or, interchangeably, top left nodes, and similarly for the other sets of vertices in a quadrant 
configuration. Furthermore, vertices in T R and T R are considered to be top vertices or top nodes, and 
similarly for the bottom vertices. 

Note that every bipartite graph has at least one quadrant configuration induced by arbitrarily dividing 
the vertices in half, and denoting one half of these vertices top vertices and the other half bottom 
vertices. Thus, the set of quadrant configurations with a particular degree distribution is at least as 
big as the set of configurations with a particular degree distribution. Because a quadrant configuration 
Q = (G,T l ,T r , B l , B r ) contains a graph G, graph properties can be extended to describe a quadrant 
configuration. So, for example, if we say that a quadrant configuration has minimum bisection width a, 
we mean precisely that the graph G within the quadrant configuration has minimum bisection width a. 

Denote the set of quadrant configurations with set node degree distributions A and P in which a is the 
number of edges leading from top vertices to bottom vertices as Q a . Note that the dependence of ()„ on 
a particular node degree distribution is implicit. Observe that every configuration with a bisection of size 
a has a corresponding quadrant configuration in Q a created in the natural way by denoting one bisected 
set of vertices as the top vertices, and the other the bottom vertices. Thus | B a I < \Qa\- 

For ease of discussion, we will assume that the total number of nodes m + n in the set of configurations 
under discussion is even, so that is an integer. 

Denote the set of quadrant configurations with a bisection of size a in which there are i top left nodes 
and j edges connecting top left vertices to the bottom right by Qf . This of course implies that there are 
— i top right nodes and a — j edges leading from the bottom left to the top right nodes. We can see 
in Figure [3] an example of such an element that we are counting for the case of n = <8 and a = 4, i = 4 
and 7 = 2. Note then that 

n a 

Qa = (J U 

i =0 j =0 

We bound the size of Q l f by counting all quadrant configurations with a bisection of size a that are the 
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edges connecting top nodes to bottom nodes. We have 



a b c 


{jV-(a - j)K Sn )-( an - a V; (!3) 

Ns ^ ✓ 

d e 


where 

a. Represents a choice of i top left nodes. 

b. Represents a choice of — i top right nodes from the m total right nodes. 

c. The quantity Of') is an upper bound on the number of choices of j sockets that will have edges that 
cross the bisection line chosen from the top variable nodes, and Of') is an upper bound on the number of 
choices for the bottom right sockets to which these edges will be connected. For a configuration in B 
there must also be a — j edges leading from the bottom left to the top right. The quantity is an upper 
bound on the number of choices of sockets in the bottom left that can have edges crossing the middle 
bisection, and similarly (]fl) is an upper bound on the number of choices for the sockets connected in 
the top right. 

d. Counts the number of permutations of edges that join the top half to the bottom half (first counting 
the j connections from the top left nodes to the bottom right nodes, then the a — j connections from the 
bottom variable nodes to the top variable nodes. 

e. This step in the quadrant configuration construction process involves permuting the connections of 
the remaining sockets in the top half and the bottom half. However, at this point it is not clear how 
many sockets are in the top half or the bottom half. However, we can upper bound the number of 
permutations possible. The number of nodes available in the top left vertices must equal the number of 
nodes available in the top right vertices (because in order to construct a valid configuration this must be 
true). By construction, the total number of nodes in the top left and top right is and thus the number 
of sockets available cannot exceed 5n, by Lemma [5j Suppose the number of sockets available for all the 
top left nodes is M and the sockets available in the bottom left nodes is N. Then there are at most M\N\ 
ways to permute these. We also know that M + N = \E\ — a (since the total number of sockets available 
on one side of the constructed quadrant configuration is \E\ and a have been used to cross between top 
nodes and bottom nodes), and that M < Sn and N < 8n. Subject to these restrictions, a direct application 
of Lemma |4] implies M\N\ < (fin)! (| E\ — 5n — a)\ = (<5n)! (cm — a)! 

Now, for the sake of simplicity, we will further loosen these bounds by upper bounding each of the 
factors a, b, c, and d. Each of these bounds is easily verified: 

a. We note that (”) < Q). 

b. Since m < n, thus ( m ™ ,) < („). 

c. Of') (]f' ? ) Of') — Of') 4 which is implied by a < an < 

d. (j)! (a — j)\ < a! which flows directly from the observation that (“) > 1. 

Combining these gives us the following bound: 

We can bound \Q a \ by summing over our upper bound on Qj 3 \: 


I Qa 


< 


< 


EElc 

i =1 j =1 




n 


n 


E I 


4 

a! (Sn)l (an 


a) l 


n 

2 


a 


(14) 
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We of course are not concerned with the probability of a bisection of size a, but rather with the probability 
of a bisection of size a or less. We denote the set of configurations with a bisection of size a or less by 
Q* a and since Q* a = U“ =0 Qa■ 


m < £ ifti 


i=0 


We will now show that the expression in (|14|) is an non-decreasing function of a for 0 < a < Let 


is greater than or equal 


the right side of the expression be denoted d a , then it is easy to show that 
to 1. It is easy to show that 

dg+ 1 = (15) ( Q + !) 

da (lfl) 4 (an-a) 

Expanding the binomial coefficients in the numerator and denominator and simplifying gives us 


d a+1 _ (\E\ — a ) 4 

d a (a + l ) 3 (an — a) 


This quantity will be greater than or equal lif|f7| — a>a + l and \E \ — a > an — a . Note that a < an 
(an assumption of our lemma) implies 2 a < 2 an < \E\. Since a and \E\ are both integers, this implies 
2a < \E\ — 1, from which we can see that the first inequality is satisfied. The second is satisfied by the 
fact that an < \E\. We thus observe that, 


\b:\< m 


<£ 19.1 

i =0 

-XX(Y) (^) a'-(dn)\(an - a)\ 

<(a + l) n 2 ^^ (^) a ' (^ n )' ( an ~ a )' 

We note that the number of possible multi-graphs with our given node degree distribution is at least 
(Sn + cm)!. We can now bound the probability of the event B* with: 


p(b;)< 


IB! 


(5n + cm)! 


< 


(a + l)n 2 (|) 2 ('f l ) a\ (5n)\ (an - a)\ 
(5n + cm)! 

where we have simply applied the upper bound for the size of B* a of (fl5|) . 


(16) 

(17) 


Appendix B 
Proof of Theorem [2] 

In this section we will prove Theorem [2j which states that if a circuit implements a graph G = (V. E) 
that has no loops, according to the standard VLSI model, the total area of that circuit is bounded by: 


A c > 


A 2 (V2-1) 2 \E\ 2 

4 \V\ 


where A w is the wire width in the circuit, and \E\ and \V\ are the number of edges and vertices in the 
graph, respectively. 
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Proof: (Of Theorem |2]) For simplicity we will say the graph has \V\ = 2 k vertices. Recall that a 
bisection of a graph is the set of edges of that graph that divides the vertices in half. A minimum bisection 
of a graph is a bisection that uses the smallest number of edges to bisect the graph. We will perform what 
we call nested minimum bisections on the graph. To do this, first, the edges in a minimum bisection of 
the graph are removed, and there are bn such edges. This divides the graph into two distinct components. 
Then, these two components (which are subgraphs of the original graph) are bisected by removing edges 
in their respective minimum bisection cut, and so 6 2 ,i and (>2,2 edges are removed. This process continues 
for k bisections, and after the kth bisection, there are 2 k disjoint subgraphs, each with one vertex, and 
no edges (because we assume that in these graphs there are no loops). It must be that the total of all the 
edges we removed equals the total number of edges in our graph; in other words, it must be that: 


k 2 i—1 

£ £ = |£| ■ (18) 

*= 1 3 =1 

Recall Thompson’s bound from Lemma [2] that says for a circuit implementation of a graph with minimum 
bisection width u>, the area of that circuit is lower bounded by: 


4A r 


> uj\ 


We can use this result to bound the total area of all of the subgraphs for each level i — 1, 2,..., k, 


4A C 


2 i — 1 



3=1 


Thus, 


4A C 

A w 


> max ( bh, 62,1 + &2, 2 > ■ ■ •, b h ) • 

3 =1 


Standard convex optimization techniques imply that this expression is minimized when: 


Cl = &i,i 

C2 = b'2 ,1 = 62,2 


Cfc = b k , 1 = b k ,2 = ... = b kt2 k-i (19) 

where for the sake of convenience we have introduced the constants c \, c 2 ,..., e^. Furthermore, it can be 
shown that 

2 fc-i 

b h = b\ x + b 2 2 2 = ... = b h = a (2°) 

i— 1 

for some a. Using the definitions of the constants ci,c 2 ,..., c* given in ([T9]), and applying this to the 
above equation ( |2Q| ) 

c\ = 2 c\ = 4 C 3 = ... = 2 k ~ 1 Ck 


from which we can infer 

C2 = C3 = T2 C2 = ( 71 ) Cl ' 

and, in general, 
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We then apply this to the constraint in (18) to give us 


E 




= Cl 


v% k 


V 2-1 


) 


/ _ k 

from which ci is easily obtained. Now, using that k = log 2 V, we have V2 

1 

\V\ 2 . Hence, we conclude that 



log 2 |V| 


4A 


> 


( y/2-1 

VvW-1 



V2- 


2 \E\ 

W 


^2 log2 1^1) 


1 

2 
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